Strandbox¶

Dataset¶

Dataset consist of scientific articles from 3 different journals:

  1. Environmental Innovation and Societal Transitions (EIST)
  2. Research in the Sociology of Organizations (RSOG)
  3. Sustainability Science (SusSci)
# Articles before preprocessing # Articles after preprocessing
EIST 683 574
RSOG 659 639
Sus-Sci 1191 1121
In [14]:
import json
import pandas as pd
In [58]:
data_path = 'data/extract_EIST.json'
with open(data_path, 'r') as fd:
    data = json.load(fd)
df = pd.DataFrame(data).T
df.head()
Out[58]:
file_name doi title abstract text location year authors
1 -It-s-not-talked-about---The-risk-of-failure-_... 10.1016/j.eist.2020.02.008 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... {'Introduction': ' A transition away from the ... UK 2020 [Beck Collins]
2 -Making-energy-transition-work---Bricolage-_20... 10.1016/j.eist.2020.07.005 “Making energy transition work”: Bricolage in ... In the quest for energy transition pathways, e... {'Introduction': ' Local energy transitions ha... Austria 2020 [Johannes Suitner, Martha Ecker, T U Wien]
3 1-s2.0-S2210422419302618-main 10.1016/j.eist.2019.10.005 Thinking about individual actor-level perspect... The 2019 STRN research agenda identifies conne... {'Introduction: background and rationale': ' T... Germany 2020 [Paul Upham, Paula Bögel, Elisabeth Dütschke]
4 1-s2.0-S2210422419302850-main 10.1016/j.eist.2019.11.008 Not more but different: A comment on the trans... The sustainability transitions research networ... {'Introduction': ' The comprehensive agenda fo... UK 2020 [Debbie Hopkins, Johannes Kester, Toon Meelen,...
5 1-s2.0-S2210422420300277-main 10.1016/j.eist.2020.02.001 Let's focus more on negative trends: A comment... Much has been written on sustainability transi... {'Introduction': ' The analysis of sustainabil... UK 2020 [Miklós Antal, Giulio Mattioli, Imogen Rattle,...
In [60]:
df.loc['1','abstract']
Out[60]:
'Scholars of sustainability transition have given much attention to local experiments in ‘protected spaces’ where system innovations can be initiated and where learning about those innovations can occur. However, local project participants’ conceptions of success are often different to those of transition scholars; where scholars see a successful learning experience, participants may see a project which has failed to “deliver”. This research looks at two UK case studies of energy retrofit projects – Birmingham Energy Savers and Warm Up North, both in the UK, and the opportunities they had for learning. The findings suggest that perceptions of failure and external real world factors reducing the capacity to experiment, meant that opportunities for learning were not well capitalised upon. This research makes a contribution to the sustainability transitions literature which has been criticised for focusing predominantly on successful innovation, and not on the impact of failure. © 2020 Elsevier B.V.'
In [61]:
data_path = 'data/prepro_EIST.json'
with open(data_path, 'r') as fd:
    data = json.load(fd)
df = pd.DataFrame(data)
In [62]:
df.head()
Out[62]:
title abstract text id time class
0 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... scholar sustainability transition given attent... 0 2020 EIST
1 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... transition use fossil fuel heat power ineffici... 0 2020 EIST
2 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... provide obstacle transfer learning following h... 0 2020 EIST
3 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... multi level perspective socio technical transi... 0 2020 EIST
4 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... research based comparative case study local ex... 0 2020 EIST
In [ ]:
df['text'][0]

Topic modelling¶

1. LDA Optimal number of topics¶

In [17]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

img = mpimg.imread('disp/3journals_optim.png')
plt.imshow(img)
plt.axis('off')
Out[17]:
(-0.5, 575.5, 431.5, -0.5)

2. Topic Network¶

In [18]:
from IPython.display import IFrame

IFrame(src='disp/topic_network_3_journals_antons.html', width=900, height=700)
Out[18]:

2.1 Network Centrality¶

In [19]:
df = pd.read_csv('disp/centrality_full_network.csv')
df.sort_values(by=['Degree Centrality'], ascending=False).head(7)
Out[19]:
Unnamed: 0 Topic Degree Centrality Degree per Article Betweenness Centrality Betweenness per Article Clustering
0 0 0_interview_data_conducted_participant 59.430657 0.716032 2213.512033 26.668820 0.130918
5 5 5_transformation_actor_change_transition 38.277372 0.797445 806.733079 16.806939 0.180654
9 9 9_emission_scenario_reduction_carbon 36.262774 0.614623 936.007673 15.864537 0.163492
2 2 2_complexity_system_approach_process 29.211679 0.561763 470.744724 9.052783 0.182266
137 137 137_game_approach_process_change 25.182482 1.144658 213.092696 9.686032 0.303333
4 4 4_area_land_water_scenario 23.167883 0.413712 288.320436 5.148579 0.245059
15 15 15_forest_land_scenario_deforestation 23.167883 0.772263 252.689720 8.422991 0.245059

2.2 Topic Co-occurrence Distribution¶

In [20]:
import plotly.express as px
df = pd.read_csv('disp/edge_weight_dist_full_network.csv')
fig = px.pie(df, values='%', names='Edge Weight')
fig.show()

3. Topics Landscape¶

In [21]:
df = pd.read_csv('disp/3_journals_topics.csv')
df = df.rename(columns={'Volumne': 'Volume'})
df.head()
Out[21]:
Unnamed: 0 Topic Label topic_nr most_freq_words rep_doc_year title Volume Authors
0 1 0_interview_data_conducted_participant 0 ['interview', 'data', 'conducted', 'participan... 2020 Sharing among neighbours in a Norwegian suburb 37 Westskog H., Aase T.H., Standal K., Tellefsen S.
1 2 1_university_student_school_stanford 1 ['university', 'student', 'school', 'stanford'... 2010 Chapter 23: The Stanford organizational studie... 28 Meyerson D.E.
2 3 2_complexity_system_approach_process 2 ['complexity', 'system', 'approach', 'process'... 2020 SHIFT IN HYBRIDITY IN RESPONSE TO ENVIRONMENTA... 69 Ramus T., Vaccaro A., Versari P., Brusoni S.
3 4 3_sustainability_research_student_science 3 ['sustainability', 'research', 'student', 'sci... 2021 The patterns of curriculum change processes th... 16.0 Weiss M., Barth M., von Wehrden H.
4 5 4_area_land_water_scenario 4 ['area', 'land', 'water', 'scenario', 'forest'... 2019 The seasonal and scale-dependent associations ... 14.0 Aiba M., Shibata R., Oguro M., Nakashizuka T.

4. Topics vs Documents¶

In [22]:
plt.figure(figsize = (35,30))
img = mpimg.imread('disp/3_journals.png')
plt.imshow(img, aspect='auto')
plt.axis('off')
Out[22]:
(-0.5, 1999.5, 1499.5, -0.5)

5. Hierarchical Plots¶

EIST Topics¶

In [23]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_eist.html', width=1000, height=1000)
Out[23]:

RSOG Topics¶

In [24]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_rsog.html', width=1000, height=1000)
Out[24]:

Sus-Sci Topics¶

In [25]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_sus_sci.html', width=1000, height=1200)
Out[25]:

Combined 3 journals¶

In [26]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_3_journals.html', width=1000, height=1400)
Out[26]:

Insights¶

1. Descriptive Statistics¶

In [27]:
df = pd.read_csv('disp/descriptive_stats_3_journals.csv')
df.head()
Out[27]:
Unnamed: 0 Topic Label standardized_mean max min
0 0 0_interview_data_conducted_participant 3.637 0.560 0.0
1 1 1_university_student_school_stanford 1.789 0.951 0.0
2 2 2_complexity_system_approach_process 1.542 0.630 0.0
3 3 3_sustainability_research_student_science 1.447 0.863 0.0
4 4 4_area_land_water_scenario 3.686 0.854 0.0

2. Temporal Trajectory¶

In [28]:
df = pd.read_csv('disp/3_journals_temp_dev_trajc.csv')
df.head()
Out[28]:
Unnamed: 0 topic_label count year_mean year_std year_min year_max coeff_linear coeff_quadratic coeff_linear_of_quadratic
0 0 0_interview_data_conducted_participant 83 2015.153846 5.446893 2001 2022 0.7577 0.0673 0.7577
1 1 1_university_student_school_stanford 17 2015.500000 3.905125 2010 2021 -0.8770 0.1167 -0.8770
2 2 2_complexity_system_approach_process 52 2016.083333 4.050892 2009 2022 0.4452 0.0382 0.4452
3 3 3_sustainability_research_student_science 36 2015.692308 4.120952 2009 2022 0.2857 0.0407 0.2857
4 4 4_area_land_water_scenario 56 2015.076923 4.730613 2007 2022 0.4114 0.0731 0.4114
In [29]:
import numpy as np
df['year_mean'] = np.around(df['year_mean'], 3)
df['year_std'] = np.around(df['year_std'], 3)
In [30]:
df.drop(['coeff_linear_of_quadratic'], axis=1).to_csv('3_journals_temp_dev_trajc.csv')

3. Temporal Trends¶

3.1. Hot topics¶

In [31]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_hot.html', width=1000, height=600)
Out[31]:

3.2 Cold Topics¶

In [32]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_cold.html', width=1000, height=600)
Out[32]:

3.3. Reviving Topics¶

In [33]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_reviving.html', width=1000, height=600)
Out[33]:

3.4. Evergreen Topics¶

In [34]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_evergreen.html', width=1000, height=600)
Out[34]:

3.5. Wallflower Topics¶

In [35]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_wallflowers.html', width=1000, height=600)
Out[35]:

Author Discourse Network Mapping¶

In [36]:
from IPython.display import IFrame

IFrame(src='disp/continous_color.html', width=900, height=600)
Out[36]:

1. Author Colaboration Network¶

In [37]:
from IPython.display import IFrame

IFrame(src='disp/colab_network.html', width=1000, height=600)
Out[37]:

2. Author Colaboration within their own community¶

In [38]:
from IPython.display import IFrame

IFrame(src='disp/within_form_colab_network.html', width=1000, height=600)
Out[38]:

3. Author Colaboration outside their community¶

In [39]:
from IPython.display import IFrame

IFrame(src='disp/outside_form_colab_network.html', width=1000, height=600)
Out[39]:

4. Discovering new discourse community¶

In [40]:
from IPython.display import IFrame

IFrame(src='disp/new_discourse.html', width=1000, height=600)
Out[40]:

5. Colaboration with new discourse Community¶

In [41]:
from IPython.display import IFrame

IFrame(src='disp/network_with_new_cluster.html', width=1000, height=600)
Out[41]:

6. Interstitial Community¶

In [42]:
#(0.33+-0.02, 0.33+-0.02, 0.33+-0.02)

from IPython.display import IFrame

IFrame(src='disp/network_with_interstitial_cluster.html', width=1000, height=600)
Out[42]:

Triad mapping based on Scientific, Associational and Managerial (Powell et. al 2017)¶

In [ ]:
"""
Associational : 
'social responsibility ,common good ,civil society ,core values ,humanitarianism ,altruism ,empowerment ,
political action ,positive action ,self interest ,moral character ,betterment ,social harmony ,advocates ,
acceptance ,accountability ,ethical behavior ,human decency ,ideals ,society ,compassion ,social stability ,
principles ,aims ,self interest ,conformity ,moral integrity ,individualism ,human dignity ,notion ,free thought ,
personal integrity ,ideology ,continued existence ,nonviolence ,individual freedom ,autonomy ,benevolence ,
noble goal ,non violence ,perpetuation ,advocacy ,religious faith ,social order ,furthering ,moral values ,
personal responsibility ,basic human ,personal agency ,very existence ,better society ,personal freedom ,discourse ,
governance ,social structure ,social change ,rationality ,social cohesion ,notions ,well being ,selfishness ,
ostensibly ,group identity ,institution ,impetus ,consequence ,morals ,human rights ,individuality ,
collective action ,personal benefit ,anathema ,activism ,strong belief ,value system ,inaction ,own sake ,
justification ,morality ,democratic values ,moral standing ,greater good ,pragmatism ,reaffirmation ,respect ,
immorality ,social equality ,social advancement ,inclusiveness ,subservience ,social reform ,undermining ,
individual autonomy ,power structure ,moral duty ,personal sense ,social movement ,undermine ,human nature ,
personal power ,accountability ,advancement ,advocacy ,awareness ,care ,charity ,commitment ,common good 
compassion ,democracy ,development ,elimination ,emotion ,empowerment ,eradication ,ethics ,justice ,
make a difference ,mission ,moral ,motivation ,participation ,principles ,quality of life ,relationship ,
rights based ,social benefit  ,social change ,social movement ,social progress ,solidarity ,trust ,values ,
vision ,voice'
"""
In [ ]:
"""
Scientific :

'methodology ,statistical analysis ,analysis ,statistical data ,inference ,empirical data ,experimental data ,
extrapolation ,underlying assumptions ,predictive power ,abstract ,hypothesis ,correlations ,experimental results ,
analyses ,real data ,inferences ,empirical evidence ,observations ,quantification ,quantifiable data ,
statistical significance ,actual data ,methodologies ,empirically ,hypotheses ,available information ,validity ,
prior research ,empirical ,statistical methods ,statistical ,empirical research ,findings ,observation ,
quantitative data ,experimental design ,statistics ,evaluation ,data set ,basic assumptions ,scientific data ,
scientific study ,scientific analysis ,actual results ,heuristic ,supposition ,entire study ,first principles ,
criterion ,scientific process ,such studies ,objective data ,mathematical model ,observational data ,metrics ,
results ,regression analysis ,conclude ,logical reasoning ,objective analysis ,causal relationships ,study design ,
hard data ,quantifying ,statistical models ,qualitative data ,research findings ,demonstrate ,sufficient data ,
data sets ,data point ,concretely ,hypothesis testing ,theoretical model ,previous research ,relevant data ,
confidence intervals ,methods ,particular study ,p values ,extrapolations ,existing research ,Bayesian ,real world ,
study ,scientific results ,available evidence ,data points ,statistical model ,implications ,counterfactuals ,
empirical results ,quantified ,real world ,basis ,extrapolating ,assertion ,analysis ,assessment ,causality ,
control group ,correlation ,counterfactual ,criteria ,data ,design ,eligible population ,evaluation ,evidence ,
experiment ,framework ,identification strategy ,indicators ,informed philanthropy ,logical framework model ,
means of verification ,measurement ,measures ,meta analysis ,methodology ,proven strategy ,quantification ,
randomized control trials ,review ,root cause ,social impact analysis ,statistically significant ,survey ,tactics ,
target group ,theory of change ,treatment effects'

"""
In [ ]:
"""
Managerial :

'efficiency ,profitability ,metrics ,scalability ,efficiencies ,productivity ,optimization ,trade offs ,performance ,
resource allocation ,optimizing ,available resources ,cost/benefit analysis ,optimize ,business case ,
considerations ,HunterSmith ,cost benefit ,overall system ,cost benefit ,terms ,current approach ,inefficiencies ,
current level ,robustness ,overall benefit ,optimise ,implementation ,capabilities ,efficiency gains ,future growth ,
improved performance ,incrementally ,cost savings ,significant impact ,energy efficiency ,cost-benefit analysis ,
tangible benefits ,leveraging ,balancing act ,minimal impact ,overall performance ,important factors ,
expected performance ,significant cost ,other considerations ,resource use ,long-term stability ,prioritisation ,
constraints ,improvements ,bottom line ,optimising ,increased efficiency ,allocation ,further development ,
decision process ,marginal benefit ,trade off ,business process ,feasibility ,resource consumption ,sustainability ,
cost/benefit ratio ,crucially ,incentives ,prioritization ,processes ,long-term viability ,system performance ,
current environment ,key factor ,usability ,power use ,risk analysis ,actual goal ,little benefit ,system ,
marginal benefits ,performance metrics ,cost/benefit ,existing systems ,acceptable level ,factor ,potential impact ,
actual results ,long-term growth ,cost effectiveness ,competence ,capability ,intended goal ,critical part ,
complexity ,ROI ,effort ,immediate benefit ,administrative overhead ,benchmarks ,best practice ,bottom line ,
capacity ,certification ,constituent satisfaction ,cost benefit ,earned income ,effectiveness ,efficiency ,
exist strategy ,growth ,impact ,key performance indicators ,lessons learned ,leverage ,management ,market based ,
milestones ,monitoring and evaluation ,objectives ,optimization ,outcome ,output ,performance ,productivity ,
return on investment ,smart giving ,stakeholder satisfaction ,strategic ,SWOT ,transparency ,value proposition ,
venture philanthropy'

"""
In [ ]:
### List of Users: InnoEnergyEU, EUeic, EITUrbanMob, EITRawMaterials, EITManufactur, EITHealth, EITFood, EITeu, 
### EIT_Digital, ClimateKIC
In [45]:
from IPython.display import IFrame

IFrame(src='disp/InnoEnergyEU.html', width=700, height=600)
Out[45]:
In [46]:
from IPython.display import IFrame

IFrame(src='disp/EUeic.html', width=700, height=600)
Out[46]:
In [47]:
from IPython.display import IFrame

IFrame(src='disp/EITUrbanMob.html', width=700, height=600)
Out[47]:
In [48]:
from IPython.display import IFrame

IFrame(src='disp/EITRawMaterials.html', width=700, height=600)
Out[48]:
In [49]:
from IPython.display import IFrame

IFrame(src='disp/EITManufactur.html', width=700, height=600)
Out[49]:
In [50]:
from IPython.display import IFrame

IFrame(src='disp/EITHealth.html', width=700, height=600)
Out[50]:
In [51]:
from IPython.display import IFrame

IFrame(src='disp/EITFood.html', width=700, height=600)
Out[51]:
In [52]:
from IPython.display import IFrame

IFrame(src='disp/EITeu.html', width=700, height=600)
Out[52]:
In [53]:
from IPython.display import IFrame

IFrame(src='disp/EIT_Digital.html', width=700, height=600)
Out[53]:
In [54]:
from IPython.display import IFrame

IFrame(src='disp/ClimateKIC.html', width=700, height=600)
Out[54]:

Vocab for State, Market and Community¶

In [55]:
state_vocab = """Sovereignty, Constitution, National Security, Foreign Relations, Diplomacy, International Law, Human Rights, Civil Liberties, Public Services, Infrastructure, Public Health, Public Safety, Social Security, Social Welfare, Public Education, Public Transportation, Taxation, Fiscal Policy, Regulatory Framework"""
state_vocab = [word.strip() for word in state_vocab.split(',') if len(word)>0]
state_vocab
Out[55]:
['Sovereignty',
 'Constitution',
 'National Security',
 'Foreign Relations',
 'Diplomacy',
 'International Law',
 'Human Rights',
 'Civil Liberties',
 'Public Services',
 'Infrastructure',
 'Public Health',
 'Public Safety',
 'Social Security',
 'Social Welfare',
 'Public Education',
 'Public Transportation',
 'Taxation',
 'Fiscal Policy',
 'Regulatory Framework']
In [56]:
market_vocab = """Profits, Competition, Consumers, Supply Chain, Distribution, Pricing, Mergers & Acquisitions, Outsourcing, Globalization, Innovation, Technology, Intellectual Property, Risk Management, Branding, Advertising, Market Research, Market Share, Market Segmentation, Market Trends, Market Analysis"""
market_vocab = [word.strip() for word in market_vocab.split(',') if len(word)>0]
market_vocab
Out[56]:
['Profits',
 'Competition',
 'Consumers',
 'Supply Chain',
 'Distribution',
 'Pricing',
 'Mergers & Acquisitions',
 'Outsourcing',
 'Globalization',
 'Innovation',
 'Technology',
 'Intellectual Property',
 'Risk Management',
 'Branding',
 'Advertising',
 'Market Research',
 'Market Share',
 'Market Segmentation',
 'Market Trends',
 'Market Analysis']
In [57]:
community_vocab = """Volunteers, Charities, Social Groups, Local Organizations, Non-Governmental Organizations, Faith-Based Organizations, Community Centers, Neighborhoods, Clubs, Activists, Advocates, Social Movements, Social Enterprises, Social Networks, Social Media, Fundraisers, Donors, Philanthropy, Collaboration, Empowerment, Inclusion, Social Justice"""
community_vocab = [word.strip() for word in community_vocab.split(',') if len(word)>0]
community_vocab
Out[57]:
['Volunteers',
 'Charities',
 'Social Groups',
 'Local Organizations',
 'Non-Governmental Organizations',
 'Faith-Based Organizations',
 'Community Centers',
 'Neighborhoods',
 'Clubs',
 'Activists',
 'Advocates',
 'Social Movements',
 'Social Enterprises',
 'Social Networks',
 'Social Media',
 'Fundraisers',
 'Donors',
 'Philanthropy',
 'Collaboration',
 'Empowerment',
 'Inclusion',
 'Social Justice']